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Abstract 

We make improvements to the upper bounds on several popular types of distance 
preserving graph sketches. These sketches are all various restrictions of the additive 
pairwise spanner problem, in which one is given an undirected unweighted graph G, 
a set of node pairs P, and an error allowance +/3, and one must construct a sparse 
subgraph H satisfying 8h(u, v ) < Sg(u, v) + /3 for all (u, v ) £ P. 

The first part of our paper concerns pairwise distance preservers, which make the 
restriction /3 = 0 (i.e. distances must be preserved exactly). Our main result here is an 
upper bound of \H\ = 0(n 2 ^ 3 \P\ 2 ^ 3 + n\P\ x ^ 3 ) when G is undirected and unweighted. 
This improves on existing bounds whenever |P| = uj(n 3 ^ 4 ), and it is the first such 
improvement in the last ten years. 

We then devise a new application of distance preservers to graph clustering algo¬ 
rithms, and we apply this algorithm to subset spanners, which require P = S x S for 
some node subset S, and (standard) spanners, which require P = V x V. For both 
of these objects, our construction generalizes the best known bounds when the error 
allowance is constant, and we obtain the strongest polynomial error/sparsity tradeoff 
that has yet been reported (in fact, for subset spanners, ours is the first nontrivial 
construction that enjoys improved sparsity from a polynomial error allowance). 

We leave open a conjecture that 0(n 2 ^ 3 \P\ 2 ^ 3 + n) pairwise distance preservers are 
possible for undirected unweighted graphs. Resolving this conjecture in the affirmative 
would improve and simplify our upper bounds for all the graph sketches mentioned 
above. 
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1 Introduction 


How much can all graphs be compressed while keeping their distance information roughly 
intact? This question falls within the scope of both metric embeddings and graph theory 
and is fundamental to our understanding of the metric properties of graphs. When the 
compressed version of the graph must be a subgraph, it is called a spanner. Spanners have a 
multitude of applications, essentially everywhere where shortest paths information needs to 
be compressed while still allowing for graph algorithms to be run. The quality of a spanner 
is measured by the tradeoff between its sparsity and its accuracy in preserving the distances. 
There are many different versions of spanners, which we discuss below. 


1.1 Distance Preservers 


One possible formalization of the spanner problem is that the distances must be preserved 
exactly. Unfortunately, it is not always possible to have a sparse spanner of this kind - just 
consider a clique; all edges must be included in the spanner, or else some distance will be 
stretched by at least one edge. Hence, the most studied version in the exact distance setting 
is that only some of the pairwise distances must be preserved exactly. 

Definition ([l] - Pairwise Distance Preservers). Let G = (V,E) be a (possibly directed, 
possibly weighted) graph, and let P C V x V. We say that a subgraph H = ( V,E') is a 
pairwise distance preserver (CEOS') of G , P if 

5 h {u,v) = S G {u,v) 


for all ( u , v) € P. 

This definition was first posed by Bollobas, Coppersmith, and Elkin [BCE03j . who de¬ 
scribed the pair set implicitly as {(u,v) \Sg(u,v) > D} for some parameter D (such an 
object is simply called a D-preserver of G). The same authors showed that \H\ = 0(n 2 /D) 
edges are sufficient and sometimes necessary to construct a D-preserver. Coppersmith & 
Elkin jCEOfij later generalized the definition to the above form. They showed upper bounds 
of 0 (?t.|J => | 1//2 ) (which apply to possibly directed and weighted graphs) and 0{n + n\P I 1 / 2 ) 
(which apply only to undirected, but possibly weighted graphs). They also proved a host of 
lower bounds; most notably that a superlinear (ui(n + |P|)) number of edges are necessary 
for any distance preserver unless |P| = (^(n 1 / 2 ) or |P| = fi(n 2 ). This lower bound holds 
even for undirected and unweighted graphs. This implies that for distance preservers for 
B(v^) pairs of nodes, @(n) edges is both an upper and lower bound. 

Distance preservers are fundamental combinatorial objects with many applications. They 
are commonly used as a tool in creating other types of graph spanners 1CE061 IBCE031 
1BW151 (we will discuss some of these shortly). Additionally, they were recently applied by 
Elkin & Pettie [EP151 to construct low-stretch path reporting distance oracles. For more 
applications, see |EP15| and the references therein. 

Although they have been successfully applied to several other important problems, no 
progress on upper or lower bounds for distance preservers themselves has been reported 
since Coppersmith & Elkin’s initial work ten years ago. This paper provides the first such 
progress. 

Theorem ([3]- Sparser Distance Preservers). Let G be an undirected and unweighted graph, 
and let P C V x V. Then there is a pairwise distance preserver of G,P on 0(n 2 / 3 |P| 2 / 3 + 
n|P| 1,/3 ) edges. 
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Following this result, the best upper bounds for undirected unweighted graphs are: 

1. 0(n 2 / 3 \P\ 2 / 3 ) when |P| = f2(n) (this paper) 

2. C^nlPI 1 / 3 ) when f2(n 3 / 4 ) = |P| = 0{n) (this paper) 

3. 0(n + v}/ 2 \P\) when |P| = 0(n 3 / 4 ) I fCEOBl l 

We consider it fairly unlikely that this piecewise behavior reflects the true upper bound 
for undirected unweighted pairwise distance preservers. Note that the upper bound 0(n + 
n 2 /3|P| 2 / 3 ) is proven for both |P| = f l(n) and for |P| = 0{n) (this bound picks out the 
point |P| = 0(n 4 / 2 ), \H\ = 0(n) also realized by the 0(n + n}/ 2 \P\) upper bound). We take 
this as compelling evidence that this bound is attainable in general. 

Conjecture ([Tl - Very Sparse Distance Preservers). Let G be an undirected and unweighted 
graph , and let P C V x V. Then there is a pairwise distance preserver of G,P on 
0(n 2 / 3 |P| 2 / 3 + n) edges. 



Figure 1: The state of the art after this paper for pairwise distance preservers on 
undirected unweighted graphs. Old upper bounds are in blue, new upper bounds in this 
paper are in solid green, and our conjectured upper bound is shown by the dotted green 
line. The dashed red lines are an infinite family of lower bounds due to Coppersmith & 
Elkin }CE06j : any tradeoff southeast of any of these lines is not possible in general. 


1.2 Graph Clustering 

On the technical side, another contribution of this paper is a new application of distance 
preservers to graph clustering. There is a rich body of work producing graph clusterings with 
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the following general properties: each cluster consists of a central “core” plus a surrounding 
shell of non-core nodes, every node belongs to the core of at least one cluster, and the 
average node only belongs to 0(1) clusters. There are also typically close upper and lower 
bounds on the radius of each cluster. Just a few of the clustering algorithms with this sort 
of behavior can be found in |AP921 ICoh931 IPR10) . 

What these algorithms commonly lack is a nontrivial bound on the total number of 
clusters produced. This makes them difficult to use for certain applications, particularly 
those related to spanners with additive error (called additive spanners). We devise a new 
clustering algorithm that allows us to have a handle of the number of clusters, and can be 
applied to constructing additive spanners. Our approach is roughly as follows. We threshold 
the size of each cluster. Clusters that are smaller than our threshold are called “small,” and 
we use arguments based on distance preserver upper bounds to show that very few edges 
participate in shortest paths through the core of a small cluster. Clusters bigger than our 
threshold are called “large,” and we can limit the total number of large clusters due to the 
lower bound on the number of nodes each one contains. Details of this process can be found 
in Lemmas El HI El 

Although our underlying clustering technique is similar to prior clustering techniques 
(e.g. region growing), our applications to additive spanners require additional properties 
that do not seem to hold in any prior clustering algorithm. In particular, we need that the 
core of each cluster is a ball of radius r (for some r) around a center node, and that the 
non-core nodes contain the 2r-ball around this center node. 

1.3 Spanners 

The most popular definition of a spanner is that all pairwise distances must be preserved 
up to an error function. 

Definition ([9]- (a,/3) spanners). An (a, /?) spanner 1Awe85 , PS89V of an unweighted, undi¬ 
rected graph G = (V, E) is a subgraph H satisfying 

Sh{u , v) < a ■ Sg(u , v) + /3 


for all u, v € V. 

Spanners are well-studied combinatorial objects. Some of their applications include 
protocol synchronization in unsynchronized networks |PU89aj , and the design of low-stretch 
routing algorithms which follow particularly compact routing tables jCowOll 1CW041 IPU89bl 
IRTZ081ITZ01] . They have also been used to create low space distance oracles |TZ051 IBS071 
IBK061 IRTZ08I and almost-shortest path algorithms IEZ061 lElkObl IElk071 IDHZ96j . Mild 
variations on graph spanners have appeared in broadcasting [FPZW04] . solving diagonally 
dominant linear systems IST04] . and more. 

Initial work on spanners studied the multiplicative case; i.e. /3 = 0. The tradeoff curve 
for multiplicative spanners is now very well understood. It was quickly observed |ADD + 9~3 
by Althofer et al. that one can obtain (2 k — 2,0) spanners on 0(n 1+1 / k ) edges for any 
integer k , and that this tradeoff is optimal assuming the popular Girth Conjecture posed 
by Erdos [Erd64j . The construction time was improved in various ways in subsequent work 
[RZ04IIRTZ051 [BS07] . A later direction of research studied mixed spanners, which contain a 
tradeoff between their a and /3 term; see IKPO 1. ITZOBI lPet07l and the references therein. We 
have a reasonable understanding of mixed spanners. Like multiplicative spanners, we know 
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a smooth tradeoff curve between their sparsity and their error. In particular, in |EP04I . 

Elkin & Peleg show that there are (1 + s,Pk,e ) spanners on 0(n 1+1 ^ k ) edges (note that the 
edge count is independent from e): that is, one can produce nearly additive spanners on an 
arbitrarily close to linear number of edges. However, there are no known matching lower 
bounds for mixed spanners, conditional or otherwise. 

In this paper we are concerned with the purely additive case, where a = 1. This case 
is not well understood. There are three known constructions in which ft is a constant: +2 
spanners on 0{n 3 / 2 ) edges (originally 0(n 3 / 2 ) in IACIM991 : the log factors were removed 
in [EP04] '). +4 spanners on 0{n 7 / 5 ) edges |Chel31 , and +6 spanners on 0(?r 4 / 3 ) edges 
|BKMP05b] . The construction of the +2 spanner was later sped up |DHZ961 IRTZ05] , and 
+6 spanner construction was sped up |BKMP05aI IWoolOj . derandomized, and simplified 
|Knul4j . However, progress has mysteriously halted at this n 4 / 3 threshold: it is currently 
open whether or not there exist spanners on 0 (n 4,/3 ~ <5 ) edges, even if the additive error 
function can be as large as +n°^ l ' > . Breaking this n 4 / 3 barrier is considered to be a major 
open question [DHZ961 IBKMP05bl IBKMP05a[ IWoolOl IBW151 IKnul41 IChel3] , but progress 
has proved quite difficult. In this sense, additive spanners do not yet enjoy a smooth tradeoff 
curve like multiplicative and mixed spanners do. 

Meanwhile, current lower bounds for additive spanners allow plenty of room for improve¬ 
ment. Erdos’ Girth Conjecture again implies that +(2 k — 2) spanners require f2(n 1+1 / fc ) 
edges for any constant fc; Woodruff IW 00 O 6 I has shown that this same lower bound holds 
independent of the Girth Conjecture. This implies that the +2 spanner is tight, but that 
the other spanners might be improvable; in particular, it is conceivable that there is a +/3 e 
spanner on 0(n 1+e ) edges for all e > 0 . 

Given the apparent robustness of the n 4 / 3 barrier to progress, researchers have sought 
spanners on n 4 / 3 ' 15 edges with small polynomial amounts of error. This is where our work 
lies. The first such spanner jBCE03) had +0(n 4 ~ 2e ) error on 0(n 1+e ) edges for all e > 0. 

There were a series of works improving this error tradeoff: +0(n 1 ~ 3e ) in (BKMP05bj . 
+0(n 9 / 16-7e / 8 ) [Pet07j . +0(n 1 ^ 2 ~ 3E ^ 2 ) with the restriction e > 3/17 jChe!3) . +0(n 1 ^ 2 ~ E ^ 2 ) 
BW15 j. and +0(n 2 / 3_5£ / 3 ) [BW151 . Jointly, these last three spanners form the current 
state of the art beneath the n 4 / 3 threshold. If the +0(n 1 ^ 2 ~ 3e ^ 2 ) spanner construction 
i( 'lie 13 worked for all e > 0, it would subsume all other known constructions. Obtaining 
this tradeoff for all e is considered an important open problem )Chel31 [BW15I . 

Our work subsumes this open problem, showing that the tradeoff 0(n 1+E ) edges/+ 0 (n 1 / 2_3e / 2 ) 
error is not optimal. Using our novel reduction between distance preserver and graph clus¬ 
tering, we show: 

Theorem ([5]- Sparse Additive Spanners). Suppose that every n-node graph has a pairwise 
distance preserver for |P| node pairs on 0(ir + n a \P \ b ) edges. Then, for all graphs G and all 
constants d, there are +n d+o(1) spanners on n 1 +°( 1 )+G+2b-i)/(a+2b+i)-d(iob-a+i)/(3(a+2b+i)) 

edges. 

The above theorem implies several new spanner tradeoffs that can be seen in the refer¬ 
ence table below: 
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Using the distance preserver bound 

The spanner has size 

0(n 1//2 |P| -1- n) (Coppersmith & Elkin [CE06 ) 

0(n 10 / 7 ~ d ) 

0(n P| 4 / 3 ) if P = O(n) (Theorem [3]) 

(5( n 5/ 4 -5d/l 2 ) if d> 3/ 13 

0(n 2 / 3 P 2 / 3 ) if P = il(n) (Theorem[3|) 

0(n 4 / 3 " 7d / 9 ) if d < 3/13 

0(n 2 / 3 P| 2 / 3 + n) (Conjecture [J) 

0(n 4 /3-W/9) 


Our spanners are the sparsest known for all d > 0. In particular, our tradeoff is better 
than the n 1 / 2-35 / 2 tradeoff for all e < 1/3. 





0 


Figure 2: State of the art for +/3 (polynomial error) additive spanners beneath the 
n 4//3 threshold. Old state-of-the-art upper bounds are in solid blue, and the (previously 
open) n 1 / 2-3 ®/ 2 bound discussed above is shown by the dotted blue line. Our new 
unconditional upper bounds are in solid green, and the upper bound obtained under 
our distance preserver conjecture is shown by the dotted green line. 


1.4 Subset Spanners 

A recent research trend has been to merge the previous two formalizations of the distance 
sparsification problem: only some pairwise distances must be preserved up to an error 
function. 
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Definition (Pairwise Spanners). Let G = (V,E) be an undirected unweighted graph, and 
let P C V x V. We say that a subgraph H = {V,E') is a +/3 pairwise spanner of G,P if 

5 h (u,v) = S G (u,v) 


for all (u,v) £ P. 

A closely related concept is: 

Definition ([8]- Subset Spanners). Let G = (V,E) be an undirected unweighted graph, and 
let P = S x S for some node subset S C V. If H is a +/3 pairwise spanner of G , P, then 
we also say that H is a +/3 subset spanner of G,S. 

There are three known constructions for pairwise spanners in their most general form. 
These are: a +2 pairwise spanner on 0(n|P| 1//3 ) edges due to Kavitha & Varma |KV13| . a 
+4 pairwise spanner on 0(n|P| 2 / 7 ) edges due to Kavitha |Kavl5| . and a +6 pairwise spanner 
on 0(7T,|P|- 1 / 4 ) edges also due to Kavitha [Kavl5j . There is also a +2 subset spanner on 
0(n|5'| 1 / 2 ) edges due to Cygan, Grandoni, and Kavitha iCGK 13 . Obtaining a constant 
error subset spanner on 0(?r|5’| 1 / 2- ' 5 ) edges (or, by extension, a constant error pairwise 
spanner on C^nlPI 1 / 4-15 ) edges) would be enough to break the ?z 4 / 3 threshold for standard 
spanners discussed above. As such, this task seems very difficult. 

Like standard spanners, then, it seems important to achieve a good polynomial spar¬ 
sity/error tradeoff below this bound. However, no progress on this task has yet been re¬ 
ported. The best construction we know is to naively ignore the given pair set and con¬ 
struct a sparse (standard) spanner with polynomial error. It is an important open question 
[CGK131IKV131 [BW15j to construct a subset/pairwise spanner that benefits in a natural 
way from a polynomial error allowance. 

That is exactly what we accomplish, for subset spanners. We prove: 

Theorem f4]- Sparse Subset Spanners). Let a,b be constants such that there is an upper 
bound of 0(n a \P\ b + n) for pairwise distance preservers. Then for any constant d, there is a 
construction of+0(n d ) subset spanners on \H\ = 0(n) + |5'|( 2b + a - 1 )/ 2 n 1 - [i ( 1 - a )+ 0 ( 1 ) edges. 

The following table gives the new bounds obtained using different distance preserver 
construcitons: 


Using the distance preserver bound 

H has size 0(n)+ 

0(n x ! 2 |P|) (Coppersmith & Elkin [CE06 ) 

\ S \3/'t n l-d/2+°(l) 

C^nlPI 1 / 3 ) if |P| = 0(n) (Theorem [3| 

|S|i/3 n i+ 0 (i) if Sj = 0(n 2d ) 

0(n 2 / 3 P 2 / 3 ) if P = f l(n) (Theorem[3]) 

|Sj 1 / 2 n 1-d / 3+ °( 1 ) if \S\ = VL(n 2d ) 

0{n + n 2 / 3 |P 2 / 3 ) (Conjecture [U 

\S\l/2 n l~ d /3+o(l) 


2 Definitions and Notations 

All graphs in this paper are undirected and unweighted. The variable n is reserved for 
the number of nodes in the graph G currently being discussed. The number of edges in G 
is denoted |G|. 
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If G = (V, E) be a graph, then we say P is a pair set on G if P C V x V. We say that 
H C G is a +/3 pairwise spanner of a graph G and a pair set P if 

Sh(u,v) < S G (u,v) +/3 

for all (it, i>) £ P. When P = V x V, we simply say that H is a +/3 spanner of G, or a 
+(3 standard spanner if we wish to emphasize its non-pairwise nature. When P = S x S 
for some node subset S C V, we say that H is a subset spanner of G, S. When k = 0 
(i.e. the distances are exactly preserved), we say that H is a pairwise distance preserver (or 
sometimes just preserver for brevity) of G, P. 

We use the notation S G (u,v ) to refer to the shortest path distance between u and v in 
the graph G. For a node u in G, we denote by B<{u,r) the set of nodes at distance r or 
less from u. Similarly, B < (u,r) is the set of nodes at distance strictly less than r from u, 
and P = (it,r) is the set of nodes at distance exactly r from u. 

3 Pairwise Distance Preservers 

Recall the following definition from the introduction: 

Definition 1. Given a graph G and a pair set P C V x V, we say that a subgraph H is a 
pairwise distance preserver of G with respect to P if 5h(u,v) = S G (u,v) for all (it, v) £ P. 

Prior work has considered distance preservers on possibly directed or weighted G, but 
we will restrict our attention to the undirected and unweighted case. 

One can imagine a pair set in which each pair (it, v) £ P has a unique shortest path 
in G. In this case, there is no room for algorithmic cleverness in the construction of the 
preserver H; it is necessary that H is exactly the union of these shortest paths. The entire 
algorithmic component of the problem lies in path tiebreaking: if there is a pair (n, v) such 
that G contains several equally short paths between u and v. then we need to choose which 
one of these to include in our preserver. We formalize this as follows: 

Definition 2. A path tiebreaking scheme on a graph G is a function p G that maps node 
pairs ( u,v ) to a shortest path in G from u to v. 

Given a graph G and a pair set P, one can construct a distance preserver by simply 
choosing a tiebreaking scheme p G , and then setting H = U pGP p G (p)- No generality is lost 
in this approach. 

A major theme of this section is the difference in power between various tiebreaking 
schemes. 

3.1 Old Tiebreaking Schemes 

Coppersmith & Elkin’s upper bound of 0(n^\P\) is realized regardless of the tiebreaking 
scheme used. Their other upper bound of 0(n + y/n\P\) is realized only by tiebreaking 
schemes with the following property: 

Definition 3. A tiebreaking scheme p G is consistent if, whenever w,x £ p G (u,v), we have 
p G (w,x) C p G (u,v). 

They also use a slight variant on the following definition: 



Definition 4. Let H be an undirected graph. We say that H has b branching events if 


b = 



where the min is taken over ways to direct the edges of H. 

Informally speaking, the number of branching events in H = (J P ep Pg{p) captures the 
number of times two paths pg(p) intersect each other and then “branch” back apart. The 
following lemma (also due to Coppersmith & Elkin) explains why this is a useful quantity 
to consider: 

Lemma 1. A graph H with b branching events contains 0(n+ ( nb j 1 / 2 ) edges. 

Proof. By a convexity argument, we have 



Assuming |~|P|/n~| > 2 (and so \H\ > n ), we have 



0(n(|P|/n) 2 ) = 0(|P| 2 /n) 


Therefore, if \H\ > n, we have \fbn = fl(|f/j). So \H\ = 0(n + Vbn). □ 

The proof of the 0(n+n 1/,2 |P|) upper bound is now straightforward. Let H = (J p£P Pg(p ) 
be your distance preserver of G, P. If pc is a consistent tiebreaking scheme, it is not too 
hard to see that any pair of paths pg{pi) and Pg{P2) can contribute at most two branching 
events to H, and therefore H has only 0(|P| 2 ) branching events. The Oin + n^^lP]) upper 
bound then follows from Lemma H] 

We now know that any consistent tiebreaking scheme implements the Coppersmith & 
Elkin upper bounds of 0(min{n + ?r 1/,2 |P|, n|P| 1//2 }). Looking forward, how can these upper 
bounds be improved? There are two possible directions of research. Perhaps (1) there are 
stronger upper bounds that apply to consistent tiebreaking schemes, and we just need to 
refine our proofs. Or maybe (2) we have exhausted the potential of the consistency definition, 
and we will need to invent some new tiebreaking schemes in order to move forward. Our 
first original result is that the answer is (2): the Coppersmith & Elkin bounds are tight for 
consistent tiebreaking schemes. 

Theorem 1. For infinitely many n and any parameter i < c < 1, there is an unweighted, 
undirected graph G on n nodes, a pair set P of size n c , and a consistent tiebreaking scheme 
PG such that 

\{J p G (p)\=n 1 ' 2 \P\ 

p£P 

Proof. Let q = n 1 / 2 be a prime. Let G be the complete graph on q layers; that is, it consists 
of q layers of q nodes, with edges placed such that a node in layer L is adjacent to exactly 
the set of nodes in layer L — 1 (if L ^ 1) and L + 1 (if L ^ q). Let P be any set of pairs 
{u, v ) such that u is in layer 1 and v is in layer q. Number the nodes in each layer from 0 to 
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Figure 3: The graph described in Theorem |I] with n 1 / 2 = 7 (not pictured: all possible 
edges between any two adjacent layers). We use P = LI x L7 (or any subset of this, if 
c < 1). The first four paths pg{p) that start at the first node in LI have been drawn 
on the graph. Note that each pair intersects on only one node. 


q — 1. Define pg by the following rule: if u is the i th node in the first layer, and v is the j th 
node in the last layer, then pc{u,v ) is the path that repeatedly travels from the k th node 
in the L th layer to the (k + (i — j ) mod q ) node in the (L + l) th layer. 

We claim that no two paths pg{pi), Pg(P 2 ) intersect on more than a single node. To see 
this: suppose that pg(w,x), pc{u,v) share the a th node in layer L and also the b th node in 
layer L' > L. Then 

a + (w — x){L' — L) = b = a + (u — v)(L' — L) mod q 

(where integers a, b , u, v, w, x stands in for the numbering of the nodes a, b, u, v, w, x in their 
respective layer). Since q is prime we can reduce this equation to w — x = u — v. We then 
have: 

w + (it; — x)L = a = u+ (w — x)L mod q 

and so w = u. This implies that (w, x) = (u, v), and so in fact these paths are identical. 

Since each pair of paths intersects on only 1 or 0 nodes, it is clear that pc is consistent. 
Additionally, this condition implies that no two paths share an edge. Since Sg(p) = n 1 / 2 
for all p e P, each path adds exactly n 1 / 2 edges to the preserver, and the claim follows. □ 

Theorem 2. For infinitely many n and any parameter 1 < c < 2, there is an unweighted, 
undirected graph G on n nodes, a pair set P of size n c , and a consistent tiebreaking scheme 
PG such that 

I (J Pg(j >)I =n\P\ 1/2 

p£P 
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Proof. Let q = n c ! 2 be a prime. Construct the complete graph on n/q layers of q nodes 
each, and choose your pair set to be any appropriately-sized set of nodes such that each 
pair has one node in the first layer and the other node in the last layer. The proof is now 
identical to that of Theorem [I] □ 

3.2 New Tiebreaking Schemes 

We will next prove a new upper bound of 0(n 2 ^ 3 \P\ 2 ^ 3 + n\P\ 1 / 3 ). By the theorems 
above, this improvement will require a new tiebreaking scheme. This scheme is contained 
in the following lemma: 

Lemma 2. Let G be an unweighted undirected graph, and let S be a subset of nodes such 
that every pair of nodes in S is distance d or less apart. Let P be a pair set such that every 
pair in P has a shortest path incident on S. Then there is a tiebreaking scheme p G such 
that the graph H = { J p G {p) has 0(n + (ro|P||<S'|d) 1 / 2 ) edges. 
p&p 

Proof. By Lemma [TJ it suffices to prove that H has 0(|P|| > S'|d) branching events. We will 
do exactly that. Let H = ( V , 0) be a distance preserver that we will build iteratively. Assign 
each pair p £ P to a node u £ S such that p has a shortest path through u. Expand the 
pair set as follows: if (a, b) is in the pair set and is owned by node u, replace it with two 
pairs (u,a) and ( u,b ). We will add a shortest path to our preserver for each pair in this 
expanded pair set, and for purposes of counting branching events, we will direct each edge 
from the node closer to u to the node closer to a/b. 

Fix an ordering of the nodes in S, and add all paths that belong to an earlier node 
before adding any paths that belong to a later node. For each node u £ S in order, start 
adding its paths to H according to any consistent tiebreaking scheme. We will maintain the 
following invariant: for each previously added path p belonging to a node v that precedes u 
in the ordering, at most 2d + 1 paths belonging to u branch with p. If we ever add a path 
belonging to u that violates this invariant, we will pause the algorithm and reroute one or 
more of these 2d + 2 paths to restore the invariant. 

Suppose that there are 2d+ 2 paths belonging to s that have each added a distinct edge 
entering some previously added path p , owned by node v. Let V\,..., t’ 2 d +2 be distinct 
nodes in p on which a path owned by u adds an edge, ordered by distance from v (so 
5 G (v, fi) < h G (v, V 2 ) and so on). By the triangle inequality, we have for all 1 < j < 2d + 2: 

Sg{u,v) > S G {u,Vj) - S G {v,Vj) > —S G (u, v) 

We also know S G (u, v) < d , so we can write 

d > 5 G (u,Vj) — S G (v,Vj) > —d 

By the pigeonhole principle, there exist values 1 < j < k < 2d + 2 with 
S G (u, Vj) - S G (v,Vj) = S G (u,Vk) - S G (v,Vk ) 


And so 


S G (u,Vj) + S G (v,v k ) - S G (v,Vj ) = S G (u,v k ) 
S G (u,Vj) + S G (vj,v k ) = 5 G (u,v k ) 
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(a) Suppose that u,v € S, with v preceding u in the ordering, and let p be a path owned by v. 
If paths owned by u enter p at 2d + 2 or more different points ... 



(b) ... then we can reroute one of these paths, without stretching its length, so that it coincides 
with another path up until it reaches p (in this picture, we have rerouted green into orange). 

Figure 4: A graphical depiction of the “rerouting” technique from Lemma [2] 


We may therefore replace the prefix pc(u,Vk) of all paths that first intersect p at the node 
Vk with the new prefix pc(u,Vj) U pc(vj,Vk), and this replacement will not stretch any of 
these paths. In doing so, we now have that no paths owned by u intersect p at the node Vk, 
and so the invariant is restored. 

Note that when we perform this rerouting, we cannot introduce any new edges to the 
preserver; therefore, when we repair the invariant on the path p 1 we will not destroy the 
invariant on any other path. □ 

With this lemma in hand, we can now prove our new upper bound. 

Theorem 3. For any undirected unweighted graph G and pair set P, there is a tiebreaking 
scheme pc such that 

| (J p a (jp)\ = 0{n 2 ' 3 \P\ 2 ' 3 + n\P\ 1 / 3 ) 

p£P 

Proof. Let e be a parameter. Start adding paths from P to your preserver in any order, 
according to any tiebreaking scheme you like. Suppose that at some point during this 
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process, a node u gains the following property: there exists a set of at most n e nodes 
within distance 1 of u such that at least n 2e distinct paths pass through one of these nodes. 
We then remove exactly n 2e of these paths from the preserver and create an auxiliary 
preserver that handles only these paths. We can now apply Lemma [2] to these paths with 
d = 2, |Sj < n e , |P| = n 2e . Therefore, the auxiliary preserver has 0{n + n 1 / 2+3e t 2 ) edges. 

At the end of this process, we have some number of auxiliary preservers, plus a “leftover” 
preserver full of paths that were never removed by the above process. We will next argue 
that the leftover preserver has only 0(n 1+e ) edges. The leftover preserver has the property 
that, for all nodes v, there is no set of rf nodes within distance 1 of v such that at least n 2e 
distinct paths pass through one of these nodes. Unmark all nodes and all edges. Repeat 
the following process until you can do so no longer: 

1. Choose an unmarked node v. 

2. If v has fewer than n e unmarked neighbors, then mark v and all its incident edges. 

3. If v has more than rf unmarked neighbors, then choose rf of its neighbors, and mark 
all of these nodes and their incident edges. 

Once we have marked all nodes, it is clear that we have also marked all edges. Each time 
we mark a single node, we mark at most rf edges along with it. Each time we mark a set 
of n e nodes, we mark at most 4n 2e edges along with it (the edges belonging to n 2e paths 
incident on this set). Therefore the graph has 0(n e ) times as many edges as it has nodes. 
So the leftover preserver has size 0(?r 1+e ) edges. 

We will next bound the size of the auxiliary preservers. First suppose that e < i, and 
so the size of each auxiliary preserver is 0(n). We then set rf = |P| 1//3 . The size of the 
leftover preserver is then 0(n|P| 1 / 3 ). Additionally, each auxiliary preserver handles |P| 2 / 3 
paths, and so at most IPI 1 / 3 of them exist, so (by a union bound) the total size of the 
auxiliary preservers is C^nlPI 1 / 3 ). The total size of the leftover plus auxiliary preservers is 
then O^PI 1 / 3 ). 

Finally, suppose that e > |, and so the size of each auxiliary preserver is 0(n 1/,2+3e//2 ). 
We then set rf = IPI 2 / 3 //! 1 / 3 . The size of the leftover preserver is then 0(n 2 / 3 |P| 2 / 3 ). 
Additionally, each auxiliary preserver handles \P\ 4 / 3 /n 2 ^ 3 paths, and so we can have at 
most n 2 / 3 /|P| X / 3 auxiliary preservers. Each one costs 0(|P|) edges, and so (by a union 
bound) the total size of the auxiliary preservers is 0(n 2 / 3 |P| 2 / 3 ). The total size of the 
leftover plus auxiliary preservers is then 0(n 2 / 3 |P| 2 / 3 ). 

Regardless of the value of e, then, the total size of the distance preserver can be expressed 
as 0(n 2 / 3 |P| 2 / 3 TnlPI 1 / 3 ). □ 

The best known upper bounds are now 0(?r 1//2 |P|) when f^n 1 / 2 ) = |P| = 0(n 3 / 4 ), then 
0(7T.|P|- 1 / 3 ) when U(n 3//4 ) = |P| = 0(n ), then 0(n 2 / 3 |P| 2//3 ) when fl(n) = |P| = 0(n 2 ). We 
consider it fairly unlikely that this piecewise behavior reflects the “true” distance preserver 
upper bound. 

Conjecture 1. Every unweighted, undirected graph G and pair set P admits a pairwise 
distance preserver on 0(n + n 2 / 3 |P| 2 / 3 ) edges. 

See Figure [T] in the introduction for a visualization of these bounds. 

Throughout the rest of this paper, we will reserve a and b for the following purpose: 
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Definition 5. We define a,b to be constants such that one can always construct distance 
preservers on 0(n + n a \P \ b ) edges. 

This allows us to prove general results in terms of a and b, and then substitute in any 
preserver upper bound at the end. 

4 Graph Clustering from Pairwise Distance Preservers 

4.1 Graph Clustering 

We begin with the following clustering algorithm: 

Lemma 3. Let G = (V,E) be an undirected unweighted graph, and let r be a parameter. In 
polynomial time, one can find a set of nodes v\,Vk (called “cluster centers”) and a set 
of integers ri,... ,rk, with r < r, < r ■ n°^\ such that the following properties hold: 

1. For each node v £ V, there is an i such that v £ B<(vi,ri) 

2. J2\ B <(. v i,2ri)\ = d(n) 

i=1 

The set B<(vi,2n) is called the “cluster” centered at Vi (also denoted W), and the set 
B<(yi,ri) is called the “core” of the cluster (also denoted Cf). 

This lemma is very similar to many previously known region-growing algorithms (see 
[cite, cite] for example). The additional structure we need, which forces us to devise a new 
algorithm rather than recycling an old one, is that the core of each cluster is padded by 
non-core nodes for at least r, distance in every direction. 

Proof. First, for every node v £ V , we will compute a value r v . Initialize r v £- r. Check 
to see if \B<(v,r v )\logn > \B<{v,Ar v )\. If so, fix r v at its current value and move on to 
the next node v £ V. If not, set r v <— 4 r v and repeat. In each iteration of the process, 
we multiply r v by 4 while we multiply \B<(v,r v )\ by at least logn. Since \B<(v,r v )\ < n 
at all times, we iterate at most , Io , s ra times, and so the final value of r„ is at most 

7 log log n 7 u 

r . 4(log n )/(log log n) __ r . n o(l)_ 

Sort all nodes v £ V descendingly by the value of r v . Now, repeat the following process 
until you can do so no longer: 

1. Remove the first remaining node v from the list, and add it to your set of cluster 
centers. Set its corresponding n value to be 2r v . 

2. For each node u with B<(u, r u ) D B<(v , r v ) 0, delete u from the list. 

We claim that we have generated a set of cluster centers with all desired properties. We 
have already shown that r < r* < r • n °^ for all i. Next, we will show that for all 
v £ V, there is an i such that v £ £>< (ty, ry). If v is a cluster center, then the claim is 
trivial. Otherwise, there must be some cluster center v-i that preceded v in the list with 
the property that B<(vi,r Vi ) fl B<(v,r v ) 0. By the triangle inequality, this implies that 
Sc(vi,v) < r Vi + r v < 2 r Vi = r*, which implies the claim. 
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k _ 

Finally, we must show that ]T) \B<{vi, 2rj)| = 0(n). Note that the sets B<(vi,ri/ 2) 

2=1 

(where Vi is a cluster center) are disjoint. We then have 

k k 

T. \B<(vi,2ri)\ < log n- y \B<(vi,ri/2)\ < nlogn 

2=1 2=1 


implying the claim. □ 

We will add some machinery to this clustering algorithm to make it useful for spanner 
creation. We will make the following distinction in cluster size: 

Definition 6. A cluster X is large with respect to a parameter £ if\X\ > r 2h /( 2fc + a - 1 )£ 1 /( 2b + a - 1 ) j 
or small otherwise. 

Here is a reference table for deciphering the exponents: 


Using the distance preserver bound 

A large cluster has size 

0{n + n 1 / 2 \P ) (Coppersmith & Elkin |CE06 ) 

fi(r 4 /3£2/3) 

0(n P| 4 / 3 ) if P = O(n) (Theorem^) 

£l(r£ 3 / 2 ) if r = H(£ 3 / 2 ) 

0(n 2 / 3 P 2//3 ) if P = H(n) (Theorem[3|) 

H(r 4 / 3 f) if r = 0{£ 3/2 ) 

0(n + n 2 / 3 P 2 / 3 ) (Conjecture [I]) 

£l{r 4 / 3 £) 


Our choice of exponents is designed to push through the following lemma: 

Lemma 4. For each small cluster Xi with center Vi, there is an integer rt < fi < 2rj with 
\B< (vi, fi)\ a (\(B=(vi, fi)| 2 ) b = OdB^r-m 
Proof. Suppose otherwise, towards a contradiction. Then we have 
|B=(t>i,ri)| > c|B < (« i ,r j )|( 1 -°)/ ( 26 >f 1 /( 2 « 

for all ri < ft < 2n and constants c. We can interpret this expression as a recurrence 
relation on the size of P<(u/,ri) as fj grows from r, + 1 to 2r, (denoted ). 

S ri+l > 1 and S k+ 1 >S k + cSy a)/{2b) £ L /( 26) 

And so 

A fc > cSy a)/{ 2 b) £ i/( 26) 

where A*, = S k + i — >SV This is a discrete approximation of the differential equation 

^ > c £^l/(2b)^(l-a)/(2b) 

dk - k 

which has the standard form y'(x) = ay(x)& (in this case, a = c£ 1 /( 2b \ and /? = (1— a)/(2b)), 
and so our discrete version enjoys the same asymptotics. The general solution to this 
differential equation is y = Ci(aa:) 1 A 1_/ V Accordingly, for our discrete version, we gain: 

s ri+k > 
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where d is some new constant dependent on the old value of c. Algebraic manipulation now 
yields 

S ri +k > d{£ 1 ^ 2b h) 2b ^ 2b+a ~ 1) 

S r +k > c 'S 1/, ' 2b+a ~ 1) k 2b/( ' 2b + a - 1 ') 

> ££l/(2b+ a — l) r 2 b/(2b+ a — l) 

Sir > C '£ 1/( ' 2b+a ~ 1) r 2b/( ' 2b+a ~ 1) 

If we choose c such that d is sufficiently large, this contradicts the assumption that Xi 
is small. □ 

This lemma is the heart of our reduction from spanners to distance preservers, and it is 
the entire reason we have gone through the trouble to build our own clustering algorithm. 
The idea is that, for each cluster, one of the following two cases must happen: (1) each 
subsequent layer of nodes around the core represents a significant growth in the cluster size, 
or (2) one of these layers L is unusually small, and therefore it is “cheap” to make a distance 
preserver on the pair set L x L. 

Lemma 5. Let X be a large cluster. Let Q be a set of node pairs contained in X. If 
\Q\ = 0(r 2 ( 1_a )/( 26+0_1 )£: 2 /( 2fc+Q “ 1 )), then there is a tiebreaking scheme px such that 

| U p x (q)\ = 0(\X\£) 

q&Q 


Another reference table: 


Using the distance preserver bound 

Q has size 

0(n + n 1 / 2 \P ) (Coppersmith & Elkin [CE06 ) 

fl(r 2 / 3 £ 4 / 3 ) 

0(n -PI 1 / 3 ) if -P = O(n) (TheoremEl) 

0(£ 3 ) if r = fl(£ 3 / 2 ) 

0(n 2 / 3 \P 2 / 3 ) if P = fl(n) (Theorem[3|) 

0{r 2 / 3 £ 2 ) if r = 0{£ 3 / 2 ) 

0(n + n 2 / 3 P 2 / 3 ) (Conjecture [J) 

0(r 2 ! 3 £ 2 ) 


Proof. Observe that 

|Q| _ ( r 2b/(2b+a-l)£l/(2b+a-l)\(l-a)/bgl/b 

Since X is large, we have |Aj > r 2b /( 2 b+a-i) £i/( 2 b+a-i) _ Therefore 

|Q| =0(|A'| (1_ “ )/6 £ 1/6 ) 

By definition of a and b , we can create a distance preserver for this pair set in the subgraph 
X paths on 0(\X\ a \Q\ b ) edges. We then have 

0{\X\ a \Q\ b ) = 0(\X\£) 

as claimed. □ 
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(a) Perhaps each subsequent ring 
around the core contains a lot of 
nodes. In this case, the size of the 
entire cluster must be fairly big, and 
so the cluster is classified as “large.” 



(c) ...we restrict attention to the 
subgraph of nodes contained in this 
small ring. Because the ring is 
small, it is not very expensive to add 
a distance preserver on all pairs of 
nodes in this ring. 



(b) Alternately, perhaps there exists 
a specific ring around the core that 
doesn’t contain very many nodes. In 
this case ... 



(d) Now, every time a shortest 
path enters and leaves the clus¬ 
ter, we have already handled all 
the edges of this path inside the 
small ring. 


Figure 5: A graphical depiction of the reduction between distance preservers and graph 
clustering. 
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4.2 Path Decomposition 

Before we proceed to our spanner algorithms, we will discuss a useful method for dividing 
paths into easy-to-analyze subpaths. 

Lemma 6. Let G be a graph and p be a shortest path in G. Let {xi,Vi} be a clustering of 
G as in Lemma\Q One can partition p into subpaths { pi ,... ,pk} such that every subpath 
Pi can be classified into one of two cases: 

1. A small subpath, for which every edge in pi is incident on some small cluster core Ci. 

2. A large subpath, in which every node is in a large cluster Xi. 

Additionally, one can assign large clusters X to large subpaths pi with pi C X such that no 
two subpaths correspond to the same large cluster. 

Proof. Choose an i such that the first node of p is in Ci. If Xi is small, then let w be the 
first node in p that is not also in Ci. Otherwise, if Xi is large, then let w be the last node in 
Xi such that pc{x,w) C Xi. In either case, add pg{u,w) to your list of subpaths, and then 
repeat the analysis on pc(w,v) (if this subpath is nonempty). Note that w ^ u (because in 
either case w £ Ci but u Ci, and so this process will eventually terminate. 

The only nontrivial detail to prove is that this process will never select the same large 
cluster Xi twice. Suppose towards a contradiction that a large cluster Xi is selected twice; 
then p must include a node c £ Ci, then a node v ^ Xi, then another node c' £ Ci in that 
order. We know 5g(c,v ) > r* and Sg(c',v) > Vi, because c, c' £ B(t>i,ri) but v B(vi,2ri). 
This implies that 5g{c, c') > 2r.i + 2. However, we also have Sq(c, vf) < ri and 5g{c', Vi) < ri, 
which implies that Sg{c,c') < 2ri. These statements are contradictory, so instead it must 
be the case that no large cluster is ever selected twice. □ 

We use this decomposition to classify the edges of each path as follows. 

Definition 7. Let pg{u,v) be a path that has been decomposed into subpaths {p\,... ,pk} 
as in Lemma 0 Then we classify the subpaths as follows: 

1. An extreme subpath is a subpath that belongs to a cluster X such that u £ X or 
v £ X. 

2. A small subpath is a non-extreme subpath that belongs to a small cluster X. 

3. A large subpath is a non-extreme subpath that belongs to a large cluster X. 
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Figure 6: How to decompose a shortest path pc{u,v) over a graph clustering (Lemma 
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5 Applications to Additive Spanners 

5.1 Subset Spanners 

Recall the following definitions from the introduction: 

Definition 8. A subgraph H is a +/3 subset spanner of a graph G and a node subset S if 

Sh(u,v) < S G {u,v) + /? 


for all u,v £ S. 

We will use Algorithm [T| to generate our subset spanners. It is trivially true that the 
output of this algorithm is a +n d subset spanner of G,S; we omit this proof. We will now 
prove an upper bound on the number of edges in the graph H returned by this algorithm. 

Overview of the Edge Bound. Take the set S x {Aj}, where A, are clusters in some 
clustering of G. Think of each element of this set as “unmarked.” Whenever we add a 
shortest path to H with endpoint s £ S that intersects a certain cluster A, we then “mark” 
the pair (s,A). Whenever we add a path pg{si,s 2 ) to H, each cluster that intersects 
Pg(si,s 2 ) will be marked along with either si or S 2 , because otherwise we have already 
accurately spanned the pair (si, S 2 ) (details of this argument are in Lemma 0. 

We then argue that (1) not very many of the edges in H are added by extreme subpaths, 
(2) the total cost of the small subpaths can be bounded by our distance preserver reduction 
(see Lemma[4]or Figure[5|), and (3) we only add |S| large subpaths per large cluster, and so 
the total cost of the large subpaths can be bounded by Lemma [5] 

We will now proceed with the proof. 

Lemma 7. Let {uj,rj} be a clustering of G as in Lemma [21 with parameter r chosen such 
that maxrj < n d /(81og?r) (so r = n d ~°A' 1 ). For each cluster A i; Algorithm [7] will add at 

i 

most | S'| paths to H that are incident on Aj. 

Proof. Consider each pair si,S 2 £ S in turn. Let p be any shortest path between si and S 2 
in G, and let {p±,... ,pk} be a decomposition of p as in Lemma [6] First, suppose that for 
some cluster Aj, we have already added shortest paths to H with endpoints si and S 2 that 
intersect Aj. In this case, we claim that we already have Sh(s 1 , S 2 ) < Sg(si, S 2 ) + n d , and 
therefore, we will skip adding pg{si,S 2 ) to H in the algorithm. To see this, let X\,X 2 £ A,; 
such that there is a shortest path between the pairs Si,a:i and S 2 ,a .’2 already in H. By the 
triangle inequality, we have: 

Sh(si,s 2 ) < S H (si, xi) + 6 h {x 1 ,£ 2 ) + S H (x 2 , s 2 ) 

$h(s 1 ,s 2 ) < <5g(si,2i) + (n d / 2) + S G (x 2 ,s 2 ) 

Let X 3 be any node in Aj intersected by p. Then 

Sh(si,s 2 ) < (Sg(si,x 3 ) + n d /(81og n)) + n d / 2 + (Sg(x 3 ,s 2 ) + n d /(481og?r)) 

Sh(si, s 2 ) < 5 g (si,s 2 ) + n d 

Therefore, each time we add a path pg(si,s 2 ) to H , for each cluster Aj intersected by 
Pg{s 1 , S 2 ), we know that pg{s\, s 2 ) is either (1) the first path with endpoint si that intersects 
Aj added to H. The lemma follows. □ 
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Algorithm 1: subspan(G, S, d> 0) 

1 Initialize H to be a • log n multiplicative spanner of G; 

2 for each pair si,S 2 € S (in some fixed, order) do 

3 if 8h{si, s 2 ) > £>g(si ; S2) + n d then 

4 | Add all edges in pg(si, S 2 ) to H ; 

5 end 

6 end 

7 return AT; 



(a) Let Xi be a cluster intersected 
by pg(si,S 2 ). If there are already 
two shortest paths through Xi with 
endpoints at si and S2--- 



(b) ...then there is already a path 
between si and S 2 in H with only 
+n d stretch, so Algorithm [T] will not 
choose to add pg(si,S 2 ) to H. 


Figure 7: A graphical depiction of the proof of Lemma [T] 


Theorem 4. For all G, there is a tiebreaking scheme pc such that the graph H returned 
by Algorithm^ has size 

\H\ = d(n ) + |5|( 2h +“- 1 )/2 n i-d(i-a)+o(i) 


Another reference table: 


Using the distance preserver bound 

H has size 0(n)+ 

0(n 1//2 |P|) (Coppersmith & Elkin [CE06 ) 

|S'| 3 / 4 n 1 d/2+o(1) 

0(n \P p/ 3 ) if P = 0{n ) (Theorem [3]) 

|S|i/ 3 ni+°(i) if |5| = 0(n 2d ) 

0(n 2 / 3 |P| 2 / 3 ) if |P| = Q(n) (Theorem[3]) 

| S .| 1 /2 n 1 -d/3+ 0 ( 1 ) if | 5 | = Q( n 2d) 

0(n + n 2 / 3 |P 2 / 3 ) (Conjecture [U 

\S\l/2 n l- d /3+°{l) 


Proof. It is well known [ADD + 93] that a • logn multiplicative spanner requires 0(n) edges. 

The remaining edges in H are all the result of adding paths pc{u,v). One again let 
{vi,Ti} be a clustering of G with parameter r chosen such that maxri < n d /(81ogn) (so 
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r = n d ~°^). Each of our paths can be decomposed over this clustering. We will say that 
an edge e £ H is extreme, small, or large depending on whether the decomposed subpath 
Pi that first added e to H is classified as extreme, small, or large as in Definition 0 
We will now count the three types of edges separately. 

Extreme Edges. Since there is a -logn multiplicative spanner already in H , and every 
path p added to H is not spanned up to +n d accuracy at the time it is added, we know that 
p is missing at least n d /\ogn edges in total. Each cluster has radius at most n d /(81ogn), so 
jointly, the two clusters in which p begins and ends contribute at most n d /( 21 ogn) of these 
missing edges. So at most half of the total edges in H fall into this category. It therefore 
suffices to prove the edge bound for the other two types of edges. 

Small Edges. For each small edge e, we know that e was a part of a subpath pi owned 
by a small cluster X,;, and that pi was a part of a larger path pc(u,v ) that did not start or 
end in Xi. Choose fi as in Lemma [4j then there are nodes x 7 ^ x' £ B=(ui, r*) C\p such that 
x,x' € pc{u,v) and e is between x and x' in pc{u,v). Therefore, e C pXt{x,x'). We can 
then cover all small edges belonging to X, using a single distance preserver on B = (vi,fi) 
within the subgraph B<(vi,fi). By Lemma [2 with the proper tiebreaking scheme, this 
requires 0(\B<(vi,fi)\£) edges. So the total number of small edges in the entire graph is 

E 0(\B<(vi,fi)\£)=£ J2 0(\Xi\) = 6(nS) 

i | Xi is small Xi is small 

where again the last equality follows from Lemma [3] 

Large Edges. For each path pg{s 1 , S 2 ) added to H by Algorithm[L] when we decompose 
these paths as in Lemma [Gj we know from Lemma [7] that a total of |Sj or fewer subpaths 
will be assigned to each large cluster. By Lemma 0 with the proper tiebreaking scheme, 
the total number of distinct edges contained in the paths belonging to a single large cluster 
Xi is only 0[\Xi\£), so long as 

\S\ = o(r 2(1_a)/(2fc+a_1) £: 2/(2b+a_1) ) 


Some algebraic manipulation gives: 

| Sj ( 26 + a - l )/2 = 0(r 1 ~ a £) 

| 5 |(26+o-l)/2 nr o-l = 0(„£) 

Recall that r = so 

|£|(26+a-l)/2 n l+ 0 (l)-d(l-a) = 

So if this condition holds, then the total number of large edges in H is: 

E °( \Xi\£) = £ E 0(1^1) = 0(E 1*1) = 0(n£) 

Xi is large Xi is large i 

where the last equality follows from Lemma [3] 
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Total. The total number of edges in H is then 2 • ( 0(n £) + 0{n£)) = 0(n£), assuming 
from the first case that 

| jS |(26+a-l)/2 n l+ 0 (l)-d(l-a) = 0 ( n £) 

We conclude that the total number of edges in H is | ( 5 '|( 2 fc + a - 1 )/ 2 3 4 5 6 7 8 9 7 l 1 +°( 1 )- d ( 1 - a ). □ 

5.2 Standard Spanners 

Recall the following definition from the introduction: 

Definition 9. A subgraph H is a +/3 (standard) spanner of a graph G if 

$h{u,v) < S G (u,v) + P 

for all u, v £ V. 

In other words, an additive spanner is a subset spanner with S = V. 


Algorithm 2: span(G, d ) 

1 Initialize H to be a • log n multiplicative spanner of G; 

2 Let £ = n (“+ 2 h —L/G+ 2 M- 1 ) — 4106 -a+l)/( 3 (a+ 26 +l)). 

3 Let S’ be a random sample of 0(logn • n i-d( 2 b-a+i)/( 2 b+a-i)/ £ (3-2b-a)/(2b+a-i)j 

nodes in G // The size of the constant in the 0 determines the 
probability of the algorithm being correct 

4 Add a +n d subset spanner of G, S to H ; 

5 for each pair u, v £ V such that Sh(u , v ) > 6g(u, v) + 8 n d do 

6 Let x u be the first node in p G {u,v) with the property that there exists s £ S 
with Sg(s,x u ) < n d /\ogn and let x v be the last such node; 

7 Add pg{u,x u ) and pg{v,x v ) to H ; 

8 end 

9 return H; 


We generate our spanners using Algorithm [ 2 ] 

Lemma 8. The output of Algorithm^ is a +0(n d ) spanner of G. 

Proof. Consider each pair it, v £ V. If we decided not to add paths pg(u, x u ) and p G {v , x v ), 
then it must be the case that Sh{u,v) < Sg(u,v) + n d . If we did add paths pg{u,x u ) and 
Pg(v, x v ), then let s u be the node in S within distance n d of x u , and let s v be the same for 
x v . From the triangle inequality, we have: 

Sh(u, v) < S H (u , x u ) + S H (x u , s u ) + S H (s u , s v ) + S H (s v , x v ) + S H (x v ,v) 

We know that Sg(x u ,s u ) < n d / logn. We have a -logn multiplicative spanner of G in H , 
so that gives 6 h(x u ,s u ) < n d . The same argument holds for Sh{x v , s v ). Additionally, due 
to our subset spanner, we have <5#(s u , s v ) < Sg{s u , s v ) + n d . We then have: 

Sh (w, v) < 6g{u , x u ) +n d + 6 g{s u , s v ) + n d + Sg{x v , v) 
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By the triangle inequality, we have 6g(s u ,s v ) < 6 g{x u ,x v ) + 0{n d ). Therefore, 
8h(u,v) < S G (u,x u ) + S G (x u ,x v ) +6 g (x v ,v ) + 0(n d ) 

Since x u ,x v lie on Sg(u,v), this implies 

6h(u, v) < Sg(u, v) + 0(n d ) 


□ 


We now need to prove the edge bound. 

Overview of the Edge Bound. For each of the paths pg(u,x u ) that we add to H, we 
can bound the cost of its extreme subpaths and its small subpaths exactly like we did in our 
subset spanner. The only challenging part of this proof is the bound on the cost of the large 
subpaths. Think about a specific large cluster A'. If it contains only a few large subpaths, 
then we can upper bound its density using Lemma [5] If it contains many large subpaths, 
then we can argue that the average cost of one of these large subpaths is fairly small. We 
then make another distinction: a heavy subpath is one that contributes a lot of edges to A, 
and a light subpath is one that is fairly cheap to add to A. Heavy subpaths are rare, and 
so they don’t contribute very many edges in total. Light subpaths mean that the path has 
lots of nodes in its neighborhood (all of A) for a relatively small number of missing edges; 
therefore, by the time the path is missing 0(n d ) edges, its neighborhood is very large. That 
makes it likely that there is a node s £ S' in this neighborhood. 

We will now start to prove the bound more formally. First, we make the following 
refinement of Definition [7] 

Definition 10. Let H C G. We say that a large subpath p, owned by large cluster X, is a 
heavy subpath if the number of edges in p but not H is at least 

| j£|(b+a-l)/&£(&-1)/& 

Otherwise, p is a light subpath. 

The purpose of this definition is: 

Lemma 9. There exists a tiebreaking scheme pG such that the following statement is true: 

Let H C G. Let Q be a sequence of node pairs that are all contained in the same large 
cluster X. Suppose we add px{q ) to H in some order for all q £ Q. Then only 0(\X\£) 
edges will be added to H by a heavy path. 

Proof. When you consider a certain pair q £ Q, if there exists a light shortest path between 
its endpoints, then add that particular path to H; this pair q then does not contribute any 
edges to the heavy path edge count. 

We are left to bound the edges only of those pairs whose path is heavy; suppose there 
are h such pairs in total. We will next prove that h = 0(|A’|( 1_a )/ b £ 1 7 b . Suppose otherwise, 
towards a contradiction (so h = co( |A'p _a )/ h £’ 1 ' /b )). Choose px to implement a distance 
preserver on 0(\X\ a h b ) edges on these pairs. The average number of edges contributed by 
each pair is 0(|A|“//i 1_& ), which is 

0{\X\ a /co({\X\^- a ^ b £ 1 / b ) 1 - b )) 
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o(|A'| a /(|X| (1_b)(1 “ a)/b £: (1 “ b/fe) )) 
o{\X I (fe+a— 1 )/fe^(&—!)/& 

Note that this is smaller than the threshold for a path to be heavy. This implies that one 
of our “heavy” pairs is in fact light - a contradiction. Therefore, h = 0(|X|( 1_a )/ b £ 1 ^ b ). 
Now, the cost of a distance preserver on this number of pairs is 

0(|X|°(|A| (1 " a) / b £: 1/b ) b ) = 0(\X\£) 

edges, which proves the lemma. □ 

We need one more technical lemma: 

Lemma 10. In Algorithm @ whenever we add pg{u,x u ) and pg(v,x v ) to H for some pair 
u,v € V, there are at least n 2 /logn edges missing from H in pg{u,x u ) U pg(v,x v ). 

Proof. Suppose towards a contradiction that pg{u,x u ) U pg{v,x v ) are missing at most 
n d /\ogn edges in H. By the triangle inequality, we have: 

Sh(u,v) < S H {u,x u ) +5h(x u ,s u ) + s„) +S H (s v ,x v ) + 6 H (x v ,v) 

Since H contains a ■ logn spanner of G, our hypothesis implies that Sh(u,x u ) + 8h(v,x v ) < 
SG(u,x u )+SG{v,x v )+n d . Similarly, 8h(x u ,s u ) < 8g{x u , s u )+n d , since the distance between 
x u and s u is at most n d /\ogn (and similar for 5h(x v ,s v ). Finally, we have 5h{s u ,s v ) < 
8g(s u , s v ), because H contains a +n d subset spanner of S. We now have 

8h{u,v) < ( 8G(u,x u )+5G{v,x v )+n d )+(SG(x u ,s u )+n d )+(SG(s u ,s v )+n d )+(5G(s v ,x v )+n d ) 

8h(u,v) < 5 g (u,x u ) + 8 g (x u ,s u ) + S G (s u ,s v ) + 8 G (s v ,x v ) + S G (x v ,v) +4 n d 

Another application of the triangle inequality gives that 5g{x u , s u )+Sg(s u , s v )+8g(s v , x v ) < 
5g{x u , x v ) + n d . We then have 

8h(u, v) < 8g(u, x u ) + S G (x u , s u ) + 5 G (s u , S v ) + 8 g(s v , X v ) + S G {x v ,v ) + 5 n d 

Sh{u, v ) < 5g(u, v ) + 5 n d 

and so the pair u,v has already been spanned accurately enough, and so we will not add 
Pg{u,x u ) or pg{v,x v ) to H. This is a contradiction, and so it must be the case that 
PG'(u, x u ) U pg{v , x v ) is missing more than n d / logn edges in H. □ 

We can now prove: 

Lemma 11. For all G, there is a tiebreaking scheme pc such that Algorithm^ returns a 
graph on 7 j 1 +°( 1 )+( a + 2b_1 )/( a + 2f, +i) — d(iob- a+i)/(3(a+2b+i)) edges. 

Proof. Recall that 

£ _ (o+2i>-l)/(o+2i>+l)-d(106-o+l)/(3(a+26+l)) 

and so it suffices to prove that there are n 1+ °^£ edges in the graph returned by Algorithm 

HI 

Once again, the - logn multiplicative spanner costs only 0{n) edges. The total cost of 
the subset spanner, implemented with Theorem [IJ is 

n 1_d/3 (fl(logn- n 1 ~ d(2b_a+1)/(2b+a_1) /£( 3 - 2f >-“)/( 2 &+a-i)))i/ 2 
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Q( n l- d /3( n l/2-d(2b-a+l)/(2(2b+a-l)) ,g(3-2b-a)/(2(2b+a-l))\\ 

One can verify that 

n £ = n l-d/3^ n l/2-d(2b-a+l)/(2(2b+a-l)) ,£ (3-26-a)/(2(26+a-l)h 


as follows: 

£l+(3-2b-a)/(2(26+a-l)) _ n ~d/3 /2-d(2b-a+l)/{2(2b+a-l)) ) 

£-(l+2b+a)/(2(26+a-l)) _ n l/2-d(l/3+(26-a+l)/(2(26+a-l))) 

£-(l+26+a) _ rl (26+a-l)-d(2(26+a-l)/3+(26-o+l)) 

£•(1+26+0) __ (26+a-l)-d-(106-a+l)/3 

Substituting in £ = n^ a ~^ 2b ~ 1 )/( a + 2f '+i) — d(i06—a+i)/(3(a+2b+i)) ; 

n (o+26-l)-d(106-a+l)/3 _ n (2b+a-l)-d-(10b-a+l)/3 

which is true, and so the subset spanner fits within our edge budget. We now need to bound 
the edges added by paths pg(u,x u ) and pc(v,x v ). We will imagine a clustering {xi,Vi} of 
G with r chosen such that max?y < n d /(32 logn). Once again, we will say that an edge is 

i 

Extreme/Small/Large (and that a large edge is heavy or light) based on the classification 
of the subpath of Pg( u , x u) that first added this edge to H. We will again count each edge 
type separately. 

Extreme Edges. There are at most n 2 /(21ogn) extreme edges in pg(u,x u ) U pg(v,x v ) 
(they belong to four clusters - at the beginning and end of pc(u,x u ) and pc(v,x v ) - and 
each cluster has diameter n d /(8 log n)). Further, from Lemma [TUI we know that pg{u, x u ) U 
Pg(v,x v ) is missing at least n 2 /logn edges. 

We conclude that only a constant fraction of the total edges in H are extreme, and so it 
suffices to prove our edge bound for the remaining cases. 

Small Edges. This case is identical to the Small Edges case in Theorem [4] 

Large Edges. Large edges can be either heavy or light. By Lemma [H each large cluster 
owns only 0(\Xi\£) heavy edges, and so the total number of heavy edges is 

E 0(1X48)= £ E \Xi\ = 0(n£) 

Xi is large Xi is large 

To bound the number of light edges, we will argue that there are more heavy edges 
than there are light edges and so the same bound applies. To see this, assume towards a 
contradiction that there are more light edges than heavy edges. From Lemma [lUl at least 
n d /\ogn edges are missing in pg(u,x u ) U po(v,x v ). Suppose at least half these edges are 
light, and let C be the set of large clusters that own a light subpath of pg(u, x u ) or pg(v, x v ). 
Suppose that all the clusters in C have the minimum possible size for a large cluster; that 
is, for all L € £ we have |L| = r ' 2b /(' 2b + a ~ 1 )£ 1 /( 2b + a ~ 1 ) ( we w iH later show that this is a 
worst-case assumption). Then we have: 

\£\ > H /r r 2b/(2b+a-l)£l/(2b+a-V)\(b+a-l)/b£(b-\)/b\ 

~ 2 log n ' 
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|£| > 71 // r 2(6+o-l)/(26+o—1) p(6+o-l)/(h(26+o-l))\p(6-l)/h\ 

— 21ogn' 

|£| > nd It 2(b+a-l)/(2b+a-l) c(2b+a-2)/(2b+a-l)\ 

1 1 “ 21ogn /V ’ 


And so 


^ |£| > n ^ r 2(b+q-l)/(2b+q-l) c(2b+a-2)/(2b+a-l) ) . r 2b/(2b+a-l)£l/(2b+a-l) 


Le£ 


£iu> 


2 log n 


r 2(l-a)/(2b+a-l) £(3-2b-a)/(2b+a-l) 


We have r = n d °W, so 


|£| > n d(2b-a+l)/(2b+a-l)-o(l)£-(3-2b-Q)/(2b+a-l) 

L&C 

Note that if our assumption fails - i.e. we have \L\ > r 2b /(' 2b + a - 1 )£ 1 /( 2b + a ~ 1 ) - then by 

convexity, our lower bound on ^ \L\ can only become stronger and so this inequality will 

lgC 

still hold. 

Note, however, that the size of our random sample of S is 

f2(?7logn/(n d(2&_ “ +1)/(2b+a_1)_o(1) £: (3_2f,_a)/(2b+a_1) )) 


and therefore, with high probability, there is a node s £ S in some cluster L £ C. This 
implies that there is a node s £ S within distance < n d / logn of some node w £ pc(u, x u ) U 
pc{v, x v ) - a contradiction. We then have that the number of light edges is strictly less 
than the number of heavy edges. 


Total. This shows that the total number of edges in H is n 1+ °(b£. By the previous 
discussion, we have set £ such that this bound suffices to prove the lemma. □ 

Jointly, Lemmas |8] and ITT] imply: 

Theorem 5. Algorithm^ produces +0(n d ) spanners. For all graphs G, there is a tiebreak¬ 
ing scheme pc such that its output graph has n 1 + o ( 1 )+( a + 2b - 1 )/(“+ 2& + 1 )- d ( 10b - a + 1 )/( 3 ( a + 2b + 1 )) 
edges. 


Here is a reference table for these exponents: 


Using the distance preserver bound 

The spanner has size 

0(n 1/,2 |P| + n) (Coppersmith & Elkin [CE06 1 

Q( n W/7~ d ) 

0(n P| 4 / 3 ) if P = 0{n) (Theorem [3]) 

0( n 5/ 4 -5d/ 12 ) if d > 3 / 13 

0(?r 2 / 3 |P| 2 / 3 ) if P = fl(n) (Theorem[3j 

d(n 4 / 3 “ 7d / 9 ) if d < 3/13 

0(n 2 / 3 |P 2 / 3 ) (Conjecture [T]) 

6(n 4 /s-M/9) 
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